Clustering Geolocation Data Intelligently in Python

Shanshan Wang
Instructor: Ari Anastassiou
Feb.17, 2021

We have taxi rank locations, and want to define key clusters of these taxis where we can build service stations for all taxis operating in that region.

Prerequisites

Project Outline

Task 1: Exploratory Data Analysis

Task 2: Visualizing Geographical Data

Task 3: Clustering Strength / Performance Metric

Task 4: K-Means Clustering

Task 5: DBSCAN

Task 6: HDBSCAN

Task 7: Addressing Outliers

Further Reading

Task 1: Exploratory Data Analysis

Task 2: Visualizing Geographical Data

Task 3: Clustering Strength / Performance Metric

Task 4: K-Means Clustering

Task 5: DBSCAN

Density-Based Spatial Clustering of Applications with Noise

Task 6: HDBSCAN

Hierarchical DBSCAN

Task 7: Addressing Outliers

Further Reading

For some additional reading, feel free to check out K-Means, DBSCAN, and HDBSCAN clustering respectively.

It may be of use to also check out other forms of clustering that are commonly used and available in the scikit-learn library. HDBSCAN documentation also includes a good methodology for choosing your clustering algorithm based on your dataset and other limiting factors.